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Real-time systems - Use-case 


Figure: (OBosch 
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Primary concern is bound latency. 
Then we also care about performance, power consumption... 
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> WCET: Worst-case execution time. 


> WCRT: Worst-case response time. 
— WCET + maximum blocking time 
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> WCET: Worst-case execution time. 


> WCRT: Worst-case response time. 
— WCET + maximum blocking time 
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> WCET: Worst-case execution time. 


> WCRT: Worst-case response time. 
— WCET + maximum blocking time 


Ti 


To 


Exec. time( 72) 


Response time( T2) 
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System: set of tasks T; = (Cj, Dj, Pi} 
С: Cost, WCET. 
D;: Deadline. 


P;: Period, minimum interval between successive job releases. 


Utilisation of a “task set”: 
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Given a set of tasks + 71.. Ти}, a scheduling method (fixed-priority, 
earliest-deadline first) and its parameters, can all tasks be guaranteed 


to always make their deadline? 


Or: for every task in a task set, is the WCRT smaller than its deadline? 
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Preemption for real-time systems is a trade-off between: 
> Lower WCRT for high-priority tasks, and 


» Higher context switching overhead. 


Can the higher overhead be justified by lower WCRT? 
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Experiment: determine schedulability of random tasksets under 
preemptive vs. non-preemptive scheduling. 


Steps: 
1. Determine WCRT of non-preemptive context switch. 
2. Estimate conservative WCRT of preemptive context switch. 


3. Compare schedulability. 
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Step 1: Determine WCRT of non-preemptive context switch. 


> NVIDIA (2009): "The Fermi pipeline is optimized to reduce the 
cost of an application context switch to below 25 microseconds. " 
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Step 1: Determine WCRT of non-preemptive context switch. 


> NVIDIA (2009): "The Fermi pipeline is optimized to reduce the 
cost of an application context switch to below 25 microseconds.” 


» But Nouveau cannot change Fermi memory clock. 
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Step 1: Determine WCRT of non-preemptive context switch. 


» Variety of Kepler GPUs (2012-2014) with different: 
» Context size, 
» Maximum memory bandwidth. 

» Same conditions: 


» Nouveau: max clockspeed, 

» Samples: 20,000,000/run, 

» Resolution: 1600x1200, 

» Workload: 1024x768 OpenArena windowed timedemo. 


> (Intrusive measurement, max. observed overhead 224ns.) 
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NVIDIA GPU (SMs) Max bw State Time (us) Avg. utilisation 

| GiB/s KiB | min avg max | GiB/s % 
GeForce GT 710 (1) 14.4 ~63.9 92 Aly Boal 2.83 (19.6%) 
GeForce GT 640 (2) 28.5 ~68.2 | 13.6 26.5 437 2.45 (8.6%) 
GeForce GTX 650 (2) 80.0 5:69:22 БІРТЕ 282003010 2.71 (3.496) 
GeForce GTX 780 (12) 288.4 ~268.6 ОЛИ 20107 22567 13 76 (4.8%) 
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NVIDIA GPU (SMs) | Max bw State Time (us) Avg. utilisation 

GiB/s KiB | min avg max | GiB/s % 
GeForce GT 710 (1) 14.4 ~63.9 g2 15 в 2.83 (19.6%) 
GeForce GT 640 (2) 285 68.2 | 136 265 437 | 245 (8.6%) 
GeForce GTX 650 (2) 80.0 oo 2 2 yaaa 020 3610 274 (3.4%) 
GeForce GTX 780 (12) 288.4 ~268.6 9701 20 028161376 (4.8%) 
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NVIDIA GPU (SMs) | Max bw State | Time (us) Avg. utilisation 

GiB/s KiB | min avg max | GiB/s % 
GeForce GT 710 (1) 14.4 ~63.9 00 Zik Suk 2.83 (19.6%) 
GeForce GT 640 (2) 285 2682 | 136 265 437 | 245 (8.6%) 
GeForce GTX 650 (2) 80.0 2:69:28 12 ao) 274 (3.4%) 
GeForce GTX 780 (12) | 2884 ~2686 | 9.7 200 286 | 1376 (4.8%) 
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Step 2: Estimate conservative WCRT of preemptive context switch. 


Assumption: WCRT grows linear with size of context. 
Each SM: 


» 256KiB registers 


» Max. 48KiB local memory 
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Ex: GeForce GT 640 (2xSM) full-preemption context size: 
68.2 4- 2 x (256 4- 48) — 676.2KiB 


Results in following (conservative) estimates: 


Ctxswitch type | Avg (us) Мах (и) 
Non-preemptive (68.2KiB) 26.5 43.7 
Preemptive (676.2KiB) 262.7 433.3 
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Step 3: determine schedulability of random task sets. 


Parameters: 
> Uniprocessor EDF scheduling policy. 
> 8.1M random tasksets (UUniFast). 
> Taskset: two tasks, 1000us < Р; < 15000и5. 


Ctxswitch type | Avg (us) Max (us) 
Non-preemptive (68.2KiB) 265 43.7 
Preemptive (676.2KiB) 2027 433.3 
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Step 3: determine schedulability of random task sets. 


Assumptions: 
> NVIDIA Tegra K1-like system (28.5GiB/s, 2x SM). 
> Non-preemptive context switch: 1 context switch /job. 


> Preemptive context switch: 2 context switches/job. 
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100 — T T T 


Schedulability (%) 


20 - Preemptive (avg) — 
Preemptive (max) ------- 
10 + Non-preemptive (avg 7 
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Schedulability (%) 


20 Preemptive (avg 

Preemptive (max 

10 Б Non-preemptive (avg, 

0 Non-preemptive (max, 
0.2 0.3 


E; UNIVERSITY OF 


L CAMBRIDGE 


Schedulability (%) 


20 - Preemptive (avg) 
Preemptive (max 
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Preemption for real-time systems is a trade-off between: 
> Lower WCRT for high-priority tasks, and 


» Higher context switching overhead. 


Can the higher overhead be justified by lower WCRT? 
— Yes 


Ba UNIVERSITY OF 


“$ CAMBRIDGE 


Preemption for real-time systems is a trade-off between: 
> Lower WCRT for high-priority tasks, and 


» Higher context switching overhead. 


Can the higher overhead be justified by lower WCRT? 


— Yes... under real-time task(/shader/compute kernel) scheduling. 
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NVIDIA GPU (SMs) | Max bw State | Time (us) Avg. utilisation 

GiB/s KiB | min avg max | GiB/s % 
GeForce GT 710 (1) 14.4 ~63.9 g2 На uk 2.83 (19.6%) 
GeForce GT 640 (2) 285 2682 | 136 265 437| 245 (8.6%) 
GeForce GTX 650 (2) 80.0 68.2122 3:2 223610 274 (3.4%) 
GeForce GTX 780 (12) | 2884 4268.6 | 97 200 286 | 1376 (4.8%) 
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NVIDIA GeForce GT 710: 


Мах 1 Max 2 Max 3 
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Context switch time (ns) 
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> Tasks on GPUs susceptible to performance interference. 


> Ex.: display scan-out interferes with context switch. 


» Need models to distinguish WCET from WCRT! 
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Preemptive GPU scheduling for real-time systems 


> Use-cases for parallel accelerators in RTS. 

> Autonomous robotics driving force. 
> Preemptive scheduling: improved WCRT outweighs overhead. 
> We need: 


> More control over task( /shader /compute kernel) scheduling policies. 
» More accurate performance interference models. 
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